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Markov chains are convenient means of generating realizations of networks with a given (joint 
or otherwise) degree distribution, since they simply require a procedure for rewiring edges. The 
major challenge is to find the right number of steps to run such a chain, so that we generate truly 
independent samples. Theoretical bounds for mixing times of these Markov chains are too large to 
be practically useful. Practitioners have no useful guide for choosing the length, and tend to pick 
numbers fairly arbitrarily. We give a principled mathematical argument showing that it suffices 
for the length to be proportional to the number of desired number of edges. We also prescribe a 
method for choosing this proportionality constant. We run a series of experiments showing that the 
distributions of common graph properties converge in this time, providing empirical evidence for 
our claims. 

PACS numbers: 89.75.Hc,05.10.Gg 

Keywords: graph generation, Markov chain Monte Carlo, independent samples 



I. INTRODUCTION 

Graphs are a common topological representation across 
a variety of scientific fields. They are used when relations 
between a large number of entities have to be specified 
in a succinct manner. Chemical reactions, molecules, so- 
cial networks (both physical and online) and the electric 
grid are some common examples. In many cases, we may 
have partial information about the graph, requiring gen- 
eration of many graphical realizations consistent with the 
(partial) characterization. Such situations arise in case 
of large online social networks (of which only a small 
part can be sampled tractably) or where the data sim- 
ply cannot be collected e.g., the web of human sexual 
relations (which only allows the estimation of the degree 
distribution). In other cases, privacy concerns can pre- 
vent the distribution of a graph for experimentation and 
or study, e.g., networks of email communications, critical 
infrastructure nets, etc. This gives rise to the problem of 
constructing "sanitized" proxies, that preserve only some 
properties of the original graph. Thus, being able to gen- 
erate independent graphs conditioned on an incomplete 
set of graphical measurements is essential for many ap- 
plications. 

Many graph generation methods aim to preserve the 
salient features of graphs [1, 7, 9, 11, 12, 15, 19, 24, 29]. 
For statistical analysis, we need algorithms that can 
generate uniformly random instances from the space of 
graphs with a specified feature. There has been signifi- 
cant work on random graph generation with a given de- 
gree distribution (DD), which specifies the number of 
vertices with a given degree. In [14], the problem of 
generating a graph with a given degree distribution was 
reduced to a prefect matching problem, which can be 
used to generate random instances by employing results 
in sampling perfect matchings [6, 13]. Alternatively, se- 
quential sampling methods were investigated in [3, 5]. 
These methods can be compute intensive, and in practice, 
Markov chains (MC) are widely used due to their sim- 



plicity and flexibility. The MC is started using a graph 
that honors the specified graphical characteristics, and 
uses a procedure that can "rewire" a graph, to create 
ensembles of graphs with the same degree distribution. 
Taylor [28] showed that edge-swaps could modify a graph 
while preserving its DD. Kannan et al [14] analyzed the 
mixing time of such a MC, whereas Gkantsidis [10] de- 
vised a MC scheme that avoids self loops. In [27], Stan- 
ton and Pinar used an MC to generate an ensemble of 
graphs using a rewiring scheme that preserved the joint 
degree distribution (JDD), which specifies the number 
of edges between vertices of specified degrees. Stanton 
and Pinar also empirically analyzed the mixing of the 
MC using the binary "time-series" traced out by the 
appearance/disappearance of edges between two labeled 
nodes, as the MC executed its random walk in the space 
of graphs. They showed that the autocorrelation of the 
time-series decayed to zero, a necessary condition for the 
MC to converge to its stationary distribution [25] . 

Graph rewiring schemes that preserve DD or JDD are 
simple (and fast) and MC chains driving them are easy 
to construct. However, successive graphs generated by 
the MC are only slightly different and the MC has to 
proceed for a large number of steps N before the initial 
graph is "forgotten." The mixing time estimates in [14] 
take the form of upper bounds and oi 0(\E\ 6 ), where \E\ 
is the number of edges in a graph. Even for small graphs 
with 1000 edges, the bound is intractable. In practice, N 
is chosen arbitrarily. Failure to mix completely leads to 
the generation of correlated samples and any results de- 
rived from them are erroneous. While the method in [27] 
demonstrated the use of autocorrelation to establish mix- 
ing, it is an a posteriori test which provides no a priori 
guidance regarding the length of the MC chain. Further, 
the method requires a small, user-specified threshold, be- 
low which the time-series autocorrelation is deemed zero. 

In this paper, we construct analytical models that esti- 
mate N. These models track the evolution of the binary 
"time-series" formed by the edges of labeled graphs gen- 
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erated by the MC. We test the model under conditions 
where the DD or the JDD is held constant. In Sec. II, 
the model is used to derive an expression for the mix- 
ing time based on when the binary time-series begins to 
resemble independent draws from a distribution. The 
models predict that the mixing time is proportional to 
\E\, the number of edges in the graph. The model holds 
true for a "representative" edge and exploits the con- 
stancy of DD (or JDD) in arriving at its prediction. The 
model is tested with real graphs in Sec. III. In Sec. IV, 
we develop a data-driven method, that assumes neither 
a constant DD (or JDD) nor a representative edge, to 
investigate the independence of the edge time-series. We 
use this test to verify the assumptions made when devel- 
oping mixing time expressions in Sec. II. We determine 
the fraction of the edges for which the model prediction 
of N is an underestimate, and the consequences of the 
lack of stationarity on the ensemble of graphs sampled in 
this manner. 



II. THEORETICAL ANALYSIS 

The goal of this paper is to provide a mathematically 
principled argument for running a MC 0(|E|) steps to 
generate independent graphs with \E\ edges. The con- 
stant hidden in the big-Oh depends on the desired ac- 
curacy. The graphs so generated may have a prescribed 
DD or a JDD; we find that a MC on graphs with a pre- 
scribed DD mixes slightly easier than those where the 
JDD is preserved. Our empirical results show that 5|E| 
- 30|E| steps are sufficient for mixing these MCs. We 
provide a mathematical justification for this observation. 

Consider a MC on the space of graphs and two la- 
beled vertices u and v. Under certain circumstances, the 
existence of an edge (u, v) can also be described by a 
Markov chain. Thus we model the behavior of the MC 
on the space of graphs in terms of a set of coupled 2-state 
Markov chains, each representing edge behavior. We de- 
velop the transition matrix for the (edge) MC, which is 
exact for DD-preserving experiments, but heuristically 
derived for MCs with prescribed JDDs. We prove an 
0(|E|) mixing time for these 2-state Markov chains. This 
ensures that that after 0(|E|) MC steps, each edge be- 
haves as if it comes out of independent draws. While 
this is not sufficient for complete mixing, it is a neces- 
sary condition. 

These mathematical theorems will be empirically val- 
idated in the next section, but we give a quick preview 
of the method. Starting from a real graph, we run a MC 
(driving a DD- or JDD-preserving rewiring scheme) for 
I steps, to create a new, nominally independent graph. 
This is repeated M times to create an ensemble of graphs. 
We plot the distribution of some common graphical quan- 
tity (e.g., the number of triangles in each graph) con- 
structed from the ensemble. This is repeated for different 
values of £ till the plots converge. We find that I ~ 10|E|. 
When I ~ |E|, the MC is far from convergence. 




Step 1: Pick two edges 
(ul,vl) and(u2, v2) 
Uniformly random 



Step 2: Swap edges 



FIG. 1: The DD-preserving swapping operation 
described in Sec. II A. 



Below, we review the rewiring schemes used in the pa- 
per. This is followed by derivations of the models of 
mixing under DD and JDD preservation. 



A. A review of rewiring schemes 

Consider an undirected graph G = (V,E), where | V| = 
n, the number of vertices in the graph, and \E\ = m, the 
number of edges. The degree distribution of the graph 
is given by the vector f, where t(d) is the number of 
vertices of degree d. The joint degree distribution is an 
n x n matrix J, where J(i,j) is the number of edges 
incident between vertices of degree i and degree j. We 
use d v to denote the degree of vertex v. 

In this paper, we use the greedy, JDD-preserving 
rewiring procedure by Stanton and Pinar in [27]. This is 
analogous to the degree distribution preserving chain [10, 
14], illustrated in Fig 1. The JDD-preserving rewiring 
procedure is shown in Fig 2. The procedure does not 
guard against parallel edges or self-loops, but the MC it- 
self can be slightly modified to ensure that these artifacts 
do not occur [10, 27]. We also maintain lists of nodes and 
edges indexed by their degree, so that for a given degree 
d, we can locate a uniform random edge that is incident 
on a degree-d node. The steps in the rewiring scheme 
are: 

• Pick an endpoint uniformly at random. This is done 
by picking a random edge and choosing one of its end- 
points with equal probability. Let this vertex be U\. We 
then choose a uniform random neighbor v of u\, to get 
edge (ui,v). 

• Pick uniformly at random another vertex U2 such 
that d Ul — d U2 , and choose a uniform random neighbor 
w incident to u 2 - This gives the second edge. 

• Swap the edges (ui,v) and («2,w) to create (ux,w) 
and («2, v). 

Details of the rewiring scheme and discussions of the er- 
godicity, mixing of a MC driving such a rewiring scheme 
can be found in [27]. A pictorial description is in Fig. 2. 
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u1 



Step 1 : Pick an edge 
(u1 , v) and pick one of 
its vertices e.g., u1 



1/m) 2 . Note that the values of a and j3 so derived are 
completely independent of the structure of the rest of the 
graph, and so the transition probabilities in Eq. 1 satisfy 
the Markov property. □ 



Modeling mixing when joint degree distribution 
is preserved 





Step 2: Pick a second 
edge (u2, w), where 
d(u1) = d(u2)ORd(u1) 
= d(w) 




Step 3: Swap edges 



FIG. 2: The JDD-preserving swapping operation 
described in Sec. II A. 



B. Modeling mixing when degree distribution is 
preserved 

Consider a MC M. that generates graphs with a pre- 
scribed degree distribution. Consider two labeled vertices 
it and v, with degrees d u and d v . We will show that the 
appearance/disappearance of (it, v) in M. can also be de- 
scribed by a Markov chain. 

Claim 1. The transition matrix of the Markov chain for 
the edge (u, v) is given by 



T(d u ,d v ) 



l-a(d u ,d v ) a(d u ,d v ) 
/3(d u ,d v ) l-f3(d u ,d v ) 



where a{d u , d v ) = j*4t, fi(d u , d v ) = 1 - (1 
the first index in T is state (no edge). 



1/m) 2 , and 



Proof. Assume that the edge (it, v) does not initially ex- 
ist. We analyze the probability of it appearing after one 
step. Let e and e' be the two edges chosen by the rewiring 
algorithm, (it, v) will appear if it is the first endpoints of 
e and v is the last endpoint of e' (or vice versa). This 
leads to four possible cases. Consider the case when both 
the vertices are the first endpoints of e and e'. The prob- 
ability of this event is d u /2m x d v /2m — d u d v /Am 2 . The 
other case is symmetric and so a{d Ul d v ) = d u d v /2m 2 . 

Suppose that (it, v) does exist but is removed during 
the Markov transition. This occurs if (it, v) is either e or 
e' . The probability of this event is (3td Ul d v ) = 1 — (1 — 



We consider a pair of labeled vertices (it, v) and 
model the appearance/disappearance of the edge during 
a Markov transition. For convenience, we will assume 
that f(d u ) and f(d v ) are both strictly greater than 1. 
The proofs can be extended to these cases as well. (Ob- 
serve that when i(d u ) — f(d v ), the edge (it, v) is always 
present because the JDD is preserved.) 

Claim 2. Assume edge (u, v) is present and is removed 
using the transition. The probability of this event is 



f3(d u ,d v ) = h 



1 f(d u )-l f(d v )-l 



m 2mi(d u ) 2mt(d v ) 



(1) 



Proof. We follow the JDD-preserving swapping proce- 
dure described in Sec. II A. Let e and e' be the two edges 
chosen for swapping. If (u, v) is e, it will definitely be 
swapped. The probability of the chosen edge being (it, v) 
is 1/m. 

The other alternative is that (it, v) is e'. This can hap- 
pen only if the first endpoint chosen has degree d u (or d v ) 
and then (u, v) is chosen. Consider the first case. The 
total number of endpoints on degree d u vertices (exclud- 
ing u) is (f (du) — l)d u . The probability that e is chosen is 
the probability that u is chosen as the second endpoint, 
and then (it, v) is chosen. This is equal to l/(f (d u ) xd u ). 
The total probability is 



(f (du) - l)d u 

2m 



1 



= (f (du) - iK = gK) - 1 

i{d u )d u 2mt(d u )d u 2mt(d u ) 



Symmetrically, the randomly chosen endpoint could 
have had degree d v . So the total probability of choos- 
ing e' = (u, v) is 



f(du 



1 



f(^) 



1 



2mi(d u ) 2m{(d v ) 



and the probability of : 



being removed is 



f3(d u ,d v ) = h 



1 f(d u )-l f(d v )-l 



2mi(d u ) 2ml (d v ) 



□ 



This expression has certain ramifications. Since the 
JDD is preserved (and by implication, ((d)), the proba- 
bility of edge removal does not vary as the Markov chain 
proceeds. This satisfies the Markov property and f3 de- 
pends only on u and v i.e., it is independent of the rest 
of the graph. 
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Consider the probability a of (u, v) appearing. This 
is not Markovian, as it depends on the structure of the 
graph. However, this dependence is weak, and we can ob- 
tain a Markovian approximation using a simple heuristic. 

Consider the number of edges 3{d,d v ) between ver- 
tices of degree d v and an arbitrary d. The expected 
number of such edges incident on a degree d v vertex is 
3(d, d v )/f (d v ). The number of edges between a vertex v 
and degree d vertices depends on the graphical structure; 
however, we will approximate it with the expected value 
J(d,d v )/f(d v ). 

Claim 3. // (it, v) is not present at a stage in the Markov 
chain, the probability a that it appears in the next step is 
given by 



a(d u ,d v ) 



23(d u ,d v ) 
mf(d u )f(d v ) 



(2) 



Proof. As before, let the two edges chosen by the al- 
gorithm be e and e', in that order. The edge (u, v) is 
swapped in when either of the four following situations 
happens. Vertex u is the first (resp. second) endpoint 
of e and v is the second (resp. first) endpoint of e'. Or, 
vertex u is the first (resp. second) endpoint of e' and v 
is the second (resp. first) endpoint of e. 

u is the first endpoint of e and v is the second endpoint 
of e' (or vice versa). 

Consider the following sequence of events. First, u is 
chosen as an endpoint (and we get edge e). Then, we 
choose a vertex v! (who is a neighbor of v) whose degree 
is the same as u. Finally, we select edge e' = (it', in- 
swapping will lead to edge (u, v). The probability of the 
first event is d u /2m. The probability of the second event 
depends the number of neighbors of degree d u incident 
to v. Alternately, this is the number of edges connecting 
v to degree d u vertices. Clearly, this depends on the 
graph structure, but our approximation of this quantity 
is 3{d u , d v )/f(d v ). 

So, the second probability is 3(d u ,d v )/(i(d v )f(d u )) 
(because there are f(d u ) neighbors of degree d u , of which 
3(d u ,d v )/f(d v ) are neighbors of v). The last probability 
is l/d u . The total probability is 



d u 3(d u ,d v ) 1 

V i y/ 

2m f(d u )f(d v ) d u 



3{d Ul d v ) 
2mf(d u )f(d v ) 



Now consider the second case. First, v' (a neighbor of 
u whose degree is d v ) is chosen as an endpoint. Then, 
u is chosen as the neighbor of v', so e = (u, v'). Then, 
v is chosen as the first endpoint of the second edge e' . 
Swapping will again lead to edge (u, v). The first prob- 
ability is approximated by 3(d u ,d v )/(f(d u ){(d v )). The 
second probability is l/d v , and the last is d v /2m. The 
probability comes out to be the same as earlier. 

By symmetry, we get the probabilities of the other 
cases by switching the roles of u and v, yielding the same 
expression. □ 



D. Estimating the mixing time 

Having developed expressions for a and j3 in Eq. 1, 
under the assumption of prescribed DD (Sec. II B) and 
JDD (Sec. IIC), we use the Markov transition matrix to 
estimate the number of steps N to achieve a stationary 
distribution, conditional on a tolerance e. 

Claim 4. Choose A as follows. Let the final distribution 
after running Ai u . v for N steps be p. Then ||p — 7r u .„|| < 
e. 



A 



(m/2)ln(l/e) 
m ln(l/e) 



DD preserving MCs 
JDD preserving MCs 



The distribution 7r U)0 is the probability of exis- 
tence/absence of edge (u,v) once M. has converged. This 
implies that after A — mln(l/e) we are very close to the 
converged distribution of all edges, irrespective of the 
degrees of their terminating vertices. 



Proof. T has two eigenvalues, 1 and 1 — (a(d u ,d v ) + 
f3(d u ,d v )). Let the corresponding unit eigenvectors be 
e! and e 2 . For notational convenience we will refer to 
a(d u ,d v ) and /3(d u ,d v ) as a and /3. Since a + > 0, 
the eigenvectors form a basis. We represent the initial 
state of the MC as v = ciei +€2^2- After N applications 
of the transition matrix, the distribution of the Markov 
chain's state is 



p = 1 v = ci 1 ei + c 2 1 e 2 

= ciei + c 2 (1 - {a + 0)) e 2 



(3) 



Since (1 — (a + /?)) < 1, the second term decays with 
A, and the stationary distribution asymptotically ap- 
proaches CiB\. The 2-norm of the decaying term (the 
error incurred when we stop at finite N) can be bounded 
for 

N = ]n{l/e)/(a + P) 
in the following manner 



\ln(l/e)/ 7 



C2 e 2 2 



||p -*■„,„ || = ||(l-7) c 2 e 2 ||2 < (1-7) 
< exp(-ln(l/e)) = e 

where we have used 7 = (a + 0) and 



mill 
A = — In - > — In - , for DD-prcscrving MC and 
2 e 7 e 

N = mln->— In-, for JDD-preserving MC. (4) 

£7e 

The key observation is 7 > 2/m for DD-preserving MC 
(see values of a and j3 in Sec. II B) and 7 > 1/m for a 
JDD-preserving MC (see Sec. IIC). □ 
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III. TESTS WITH REAL GRAPHS 

In this section we perform empirical studies to find a 
suitable value of e, i.e., one which can be used to gen- 
erate samples of graphs which are statistically indistin- 
guishable from those generated by a smaller e. We dis- 
criminate graphs using certain graphical metrics. 

We run the MC with the DD-preserving rewiring 
scheme for N steps before saving the resultant graph. 
We start the MC using 4 real graphs - the neural net- 
work of C. Elegans [30] (referred to as "C. Elegans"), 
the power grid of the Western states of US [30] (called 
"Power"), co-authorship graph of network science re- 
searchers [18] (referred to as "Netscience" ) and a 75,000 
vertex graph of the social network at Epinions.com [23] 
("soc-Epinionsl"). Their details are in Table I. The first 
three were obtained from [17] while the fourth was down- 
loaded from [26] . All the graphs were converted to undi- 
rected graphs by symmetrizing the edges. 

We use N = {0.5, 2.5, 5, 7.5}|i?| corresponding to e = 
{0.37, 6.7x 10" 3 , 4.5 x 10" 5 , 3.06x 10~ 7 }. 1000 graphs are 
generated in this manner, starting with the real graphs 
in Table I. Thereafter, we calculate the global clustering 
coefficient (three times the ratio of the number of trian- 
gles to wedges in a graph) [2], the diameter, and the maxi- 
mum eigenvalue of the Laplacian matrix of the graph [16] 
for each of members of the ensemble of 1000 graphs. In 
Fig. 3, we plot these distributions for three of the four 
graphs in Table I. We clearly see that for e = 0.37 i.e., 
N = 0.5\E\, the distributions for all graphical metrics 
are quite different from the distributions obtained with 
N > \E\. The red and black curves, corresponding to 
N = 5\E\ and N — 7.5\E\ are mostly indistinguish- 
able indicating that the MC has converged to its sta- 
tionary distributions. We will henceforth proceed with 
e = 4.5 x 10~ 5 , achieved with N = 5\E\ when the DD is 
preserved during the MC sampling. 

We now turn our attention to the case when the 
JDD is preserved during MC sampling. We repeat the 
same sampling procedure as above, but with the JDD- 
preserving rewiring scheme being driven by the Markov 
chain. We choose the same sequence of e, corresponding 
to N = {1, 5, 10, 15}\E\, per Eq. 4. The distributions of 
global clustering coefficients, graph diameter and max- 



TABLE I: Characteristics of the graphs used in this 
paper. (|V|, \E\) are the numbers of vertices and edges 
in the graph and G-R diagnostic is the Gelman-Rubin 
diagnostic. 



Graph name (|V|, \E\) G-R diagnostic 

C. Elegans (297, 4296) 1.05 

Netscience (1461, 5484) 1.02 

Power (4941, 13188) 1.006 

soc-Epinionsl (75879, 405740) 1.06 



imum eigenvalue of the graph Laplacian are in Fig. 4. 
Again, it is clear that the distribution (for any of the 
graphical metrics) is far from convergence for e = 0.37 
but indistinguishable for the other values of e. In par- 
ticular, the red and black plots lie on top of each other 
in most of the subfigures, indicating that N = 10\E\ 
(e = 4.5 x 10 -5 ) is generally sufficient to drive the MC to 
its stationary distribution, as characterized by the afore- 
mentioned graphical parameters. 

In summary, a tolerance of e = 4.5 x 10 -5 provides con- 
verged/stationary distributions of global clustering coef- 
ficient, graph diameter and maximum eigenvalue of the 
graph Laplacian. This corresponds to running the MC 
on the space of graphs for N = 5\E\ steps when DD is 
preserved and for N = 10\E\ steps when JDD is pre- 
served. These results provide an empirical proof that the 
models in Sec. II B and Sec. II C provide a good a priori 
estimate of the steps to run a MC to obtain stationary 
results. 



IV. VERIFICATION OF MODEL 
ASSUMPTIONS 

In this section, we check our assumption that a MC on 
the space of graphs can be approximated by coupled 2- 
state Markov chains. We also devise a test to ensure that 
the MC does not converge to a local mode in the distri- 
bution of graphs. This can occur in case of multimodal 
distributions. 



A. Verifying edge-by-edge convergence 

The equation for N (Eq. 4) is based on a heuristic 
and has to be verified. Further, the equation is strictly 
valid for an edge and it is unlikely that after N steps of 
the MC, all the edges have decorrelated and converged 
to their stationary distribution. The residual number of 
correlated edges have to be identified and their impact 
on graphical properties quantified. 

Consider the behavior of an arbitrary edge (between 
labeled vertices) during a long walk executed by a MC 
on the space of graphs. During this walk, the edge ap- 
pears and disappears, thus describing a binary time-series 
{Z t }. If the MC on the space of graphs reaches a sta- 
tionary distribution in 0(|E|) steps, then thinning the 
binary time-series by a factor of 0(|E|) should result in 
a time-series that resembles independent draws of zeros 
and ones. We develop a nonparametric test of indepen- 
dence for such time-series. This method exploits Sokal's 
observation that if an MC reaches a stationary distribu- 
tion, the samples it draws are uncorrelated i.e., the auto- 
correlation is small [25]. It is general and does not depend 
on the preservation of DD or JDD in the graphs. While 
this test has been in existence for some time [20, 21], it 
is not well known in the network generation literature. 
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FIG. 3: Plots of the distributions of the global clustering coefficient, the diameter, and the maximum eigenvalue of 
the graph Laplacian for "C. Elegans" (left), "Netscience" (middle) and "Power" (right), evaluated after 
. 5 1 ^Zi7 1 , 2.5|-E|, 5\E\ and 7.5|i?| iterations of the Markov chain (green, blue, black and red lines respectively). The 
corresponding values of e are in the legend. We see that the distributions converge at e ~ le — 5. In these runs, the 

DD was held constant across all graphical samples. 



Assume that the time-series {Z t } is very long i.e., it 
takes K 3> N steps. The time-series will be correlated 
as observed in [27]. Consider too, a fc-thinned chain 
{Z^}, obtained by retaining every k th member of {Z t }. 
The thinned chain will have a far smaller autocorrelation 
and upon further thinning will begin to resemble inde- 
pendent draws from a distribution. If Eq. 4 is correct, 
k = N should yield a time-series that resembles inde- 
pendent draws more than a first-order Markov process. 
This forms the basis of our test of independence. Wc 
fit models of two processes, independence and first-order 
Markov, to the data and compute the log-likelihood. The 
"goodness" of model fit is determined by computing the 
Bayesian Information Criterion (BIC). This paper is the 
first application of this technique to graphs, though it 
has been used in other contexts. 

Let Xij be the number of € (0, 1) transitions 



in the fc— thinned chain {Z^}. The are used to pop- 
ulate X, a 2 x 2 contingency table. The table entries are 
normalized by the length of the fc-thinned chain (K/k—1) 
to provide us with the empirical probabilities p,j of an 
transition in {Z^}. Let pTj and Xij = (K/k — l)pi} 
be the predictions of expected values of the table en- 
tries provided by a model. Then the goodness-of-fit of 
the model can be provided by a likelihood ratio statis- 
tic (called the G 2 -statistic; Chapter 4.2 in [4]) and a 
Bayesian Information Criterion (BIC) score: 

i=oi=o V^w/ 
BIC = G 2 + q\og(^-l) , (5) 
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FIG. 4: Plots of the distributions of the global clustering coefficient, the diameter, and the maximum eigenvalue of 
the graph Laplacian for "C. Elegans" (left), "Netscience" (middle) and "Power" (right), evaluated after 
\E\, 5\E\, 10\E\ and 15|i?| iterations of the Markov chain (green, blue, black and red lines respectively). The 
corresponding values of e are in the legend. We see that the distributions converge at e ~ le — 5. In these runs, the 

JDD was held constant across all graphical samples. 



where q is the number of parameters in the model used to 
fit the table data. Log-linear models are generally used 
to model tabular data (Chapter 2.2.3 in [4]). The log- 
linear models' predictions for table entries generated by 
independent sampling and a first-order Markov process 
are 



Iog(?S?) 



,(M) 



(M) 



u (I) 

,(M) 

,00 



and 



u„ 



AM) 



(6) 



where superscripts /, M indicate an independent and a 
Markov process respectively. The maximum likelihood 
estimates (MLE) of the parameters {ujffl) are available 
in closed form in literature (Chapters 2.2.3 and 3.1.2 



in [4]) and we reproduce them below: 



(x i+ )(x 



x++ 



and xM 



(7) 



where and x+j are the sums of the table entries in 
row i and column j respectively. x++ is the sum of all 
entries (i.e., K/k — 1, the number of transitions observed 
in {Z^}, or the total number of data points). We compare 
the fits of the two models thus: 

ABIC = BIC (I) - BIC {M) 

i=l i=l 




Above, we have substituted x\j = Xij and the fact that 
the log-linear model for a Markov process has one more 
parameter than the independent sampler model. Large 
BIC values indicate a bad fit. A negative ABIC indicates 
that an independent model fits better than a Markov 
model. 

This test is applied as follows. We construct a thinned 
binary time-series {Z^} for k = \E\ for each of the edges. 
The ABIC is computed and edges with negative ABIC 
are deemed to have become independent after k steps of 
the Markov chain. The test is repeated with a higher k 
till all the edges are deemed to have become independent. 
A CH — h implementation of this test, using a binary time- 
series as input, is available at [22]. 

However, a difficulty appears at this point. We see 
that the accuracy of the log-linear models in Eq. 6 and 7 
depend on the length of the chain being tested i.e., if the 
chain is too short, the ratio of likelihoods and the BIC 
will be wrongly calculated. Thus, before we populate the 
contingency table, we ensure that the edge time-series is 
long enough. 

Consider an edge time-series obtained from the K-step 
MC. One can empirically compute the edge-mean Z t i.e., 
the probability of existence of the edge. An estimate of 
the same quantity can be obtained from a k— thinned 
version of the chain i.e., Z^. If the thinned time-series 
are sufficiently long, then the estimates of the edge mean, 
Z t fc , computed for different values of k, approximately 
describe a normal distribution, with mean q (the edge 
mean) and a variance a 1 . 

We wish to ensure that a fc-thinned MC is sufficiently 
long, i.e., the empirical edge- mean Z\ can be estimated 
to lie inside q ± r with confidence s i.e.P(q — r < Z£ < 
q + r) = s, or 

(^-Uo.su + s)}) =<j2 (9) 

where $ is the cumulative distribution function for a 
standard normal distribution and r is a tolerance on the 
empirical edge- mean calculated using a fc-thinned MC. 

Consider that thinning the K-step time-series by a fac- 
tor k' renders it a first-order MC (see Appendix A on 
how this may be done). The contingency table entries 
(Eq. 7) then indicate the number (or proportions) of 0-1 
and 1-0 transitions of the 2-state Markov chain model of 
the time-series. It is then trivial to compute the entries 
a and /3 of the Markov transition matrix. If n' is the 
length of the fc'-thinned time-series, then the variance of 
the Z\ , a 2 can be written in terms of a, f3, 

2 _ a/3(2-a-P) 
a ~ n'(a + /3) 3 



and so, using Eq. 9, 

q/3(2-q-/3) 

n'= ia+ f )3 (10) 

^*-!{0.5(l+s)} J 

i.e., the /c'-thinned time-series must be at least n' in 
length to provide an estimate Z\ of q (computed from 
the -fT-step time-series) within tolerance r with the re- 
quired confidence s. In this paper, we use r = 0.01 and 
s = 0.95. 

We now apply this test to a Markov chain on graphs 
initiated with those mentioned in Table I. In Fig. 5 we 
see the thinning factors required to render the fc-thinned 
time-series {Z^} resemble independent draws from a dis- 
tribution. In Fig. 5 (left) we see that thinning by 4\E\ is 
sufficient to render {Zj?} independent, for all three graphs 
tested, when the DD is preserved by the MC. In Fig. 5 
(right) we see results for the JDD-preserving counterpart. 
It is clear that higher thinning factors are required for in- 
dependence, but a thinning factor of 10\E\ seems to be 
sufficient. Thus empirically, the factor-of-two difference 
in N, as predicted by the models in Eq. 4, and as seen 
empirically in Fig. 3 and Fig. 4, are corroborated by a 
separate test that does not assume DD or JDD preserva- 
tion. 

Convergence of distributions of global clustering coef- 
ficients, diameter and maximum eigenvalues do not auto- 
matically indicate that the time-series of all the edges in 
a graph resemble independent draws. This is clearly seen 
in Fig. 4 for the graph "Power" (bottom row). In Fig. 4, 
N ~ 10 1 £7 1 (e = 4.5 x 10 -5 ) was sufficient to construct 
distributions that do not change appreciably for greater 
N. However, from Fig. 5, we see that about 2% of the 
edges are still not independent, but presumably close to 
being so, since about 15|i?| steps of the MC result in all 
edges becoming independent. 

In order to check this issue, we performed a test with 
the soc-Epinionsl graph. We generate an ensemble of 
1000 graphs using N = 30\E\ and the method in Sec. II C 
i.e., where the JDD of the initial graph was preserved as 
samples were generated. We also run a long K-step MC 
(K = 21.6 x lO 6 ^!) and compute the thinning factor 
k required to render each edge's time-series similar to 
independent draws. Since the graph has 405,740 edges, 
the independence test was performed for only 10% of the 
edges, chosen randomly from the initial graph. In Fig. 6 
we plot a histogram of k/\E\. We clearly see that while 
by k = 3Q\E\ about 90% of the edges have become in- 
dependent, there are quite a few "pathological" edges 
that are very far from becoming independent (one such 
edge becomes independent at k = 720|£'|). In Fig. 7 we 
plot the distributions for the diameter and the maximum 
eigenvalue for N = 30\E\, 150|£|, 270|£| and 390|£|. We 
see that the two distributions, while not identical, do not 
change much. Thus distributions of the chosen metrics 
are robust (insensitive) to an ensemble which might con- 
tain graphs with a few "hard-to-converge" edges. Thus, 
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FIG. 5: Fraction of edges testing independent, for "C. Elegans", "Netscience" and "Power" for various values of the 
thinning factor k. Left: Results for the case when DD is preserved. We see that with a thinning factor of k — 4\E\ 
all the edges have converged to their stationary distribution. Right: Results for the case when JDD was preserved. 
We see that k = 1Q\E\ ensures that at least 95% of the edges become independent. 



if one is interested in generating proxy graphs where 
anonymity is critical (e.g. proxies of email traffic in an 
organization), an exhaustive, edge- by-edge checking of 
independence, using the method described above, may 
be necessary. On the other hand, when one only desires 
a collection of roughly independent graphs with a pre- 
scribed DD or JDD, the far simpler approach of Sec. II B 
or II C may suffice. 



B. The Gelman- Rubin test for convergence of 
Markov chains 



Finally we address the question whether the MC gen- 
erating the ensemble of graphs is sampling from a par- 
ticular mode of the distribution of graphs (in case the 
distribution is multimodal). This can, in principle, occur 
since the MC is always started from the same starting 
graph. Convergence to the stationary distribution (and 
not just to one mode) can be tested by starting the MC 
from an overdispersed set of points and checking whether 
the same distributions are realized, irrespective of the 
starting point. In our case, we will choose three points 
and perform this test only for the JDD-preserving case. 

In order to generate a set of overdispersed starting 
graphs, we run a Markov chain for N = 10, 000|i?| steps, 
after initializing it with a real network. This serves as the 
second over-dispersed starting point for the MC and the 
initial condition for the third point (realized after another 
10,000|i?| steps). Three concurrent MC are initialized 
with these graphs, and we calculate the Gelman-Rubin 
(G-R) diagnostic [8] using the binary edge time-series. 
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FIG. 6: A histogram of k/\E\, where k is the number of 
steps the MC has to take to render the time-series of a 
particular edge resemble independent draws, calculated 

for the soc-Epinionsl graph. We see that while 30\E\ 
steps render about 90% of the edges independent, there 

are a few edges which are still very far from becoming 
independent. These results are for a case where the JDD 
was preserved. 



Values of the diagnostic between 1 and 1.1 indicate that 
the states of the concurrent Markov chain are not depen- 
dent on the starting location. We performed this test 
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FIG. 7: Plots of the distribution of graph diameter (left) and maximum eigenvalue of the graph's Laplacian (right) 
for N = 30|-B|, 150|i?|, 270|£'| and 390 1 £7 1 , for the soc-Epinionsl graph (green, blue, black and red lines respectively). 
We see that distributions are very similar, even though, from Fig. 6, it is evident that a few edges are far from being 
independent. Thus these two graphical metrics are not very sensitive graphs that are almost, but not quite, 
independent. These results are for a case where the JDD was preserved. 



for all 4 graphs; the corresponding G-R diagnostics are 
tabulated in Table I. 



V. CONCLUSIONS 

We have developed a method for generating inde- 
pendent realizations of graphs which share a particular 
graphical property. In this paper, we have demonstrated 
the method in the case where (1) the degree distribution 
was preserved over all realizations and (2) where the joint 
degree distribution was held constant. The graphs are 
generated using a MC approach, which drives a graph 
"rewiring" scheme. The rewiring scheme is responsible 
for preserving the shared graphical properties when it 
generates a new graphical realization, from an old one. 
Our method involves running the Markov chain for N 
steps before extracting a graph; the chain is run repeat- 
edly to generate an ensemble of realizations, given an ini- 
tial graph. We have developed models and closed form 
expressions for N that allows a coupled 2-state Markov 
chain of an edge to converge to its stationary distribu- 
tion. This is a necessary, but not sufficient, condition for 
the Markov chain on the space of graphs to "forget" the 
starting graph and generate an independent realization. 
We find that a Markov chain requires 5\E\ — 30\E\ steps 
to mix fully, i.e., to provide converged distributions of 
graphical properties like the global clustering coefficient, 
graph diameter and maximum eigenvalues of the graph 
Laplacian. We find that the Markov chains that preserve 
the degree distribution are easier to mix than those where 
the joint degree distribution is held constant. 



We verified our model (for N) by performing a test for 
the independence of the Markov chain described by any 
edge between two labeled vertices in the graph, as the 
Markov chain on the space of graphs executes its ran- 
dom walk. This test is not dependent on any heuristic 
or graphical properties. The test uses the time-series of 
occurrence/non-occurrence of an edge between labeled 
vertices, thins them by a factor of N, and fits it with a 
first-order Markov and an independent sampling model. 
The goodness of fit of the two models, determined by 
their individual BICs, is used to select between them. 
The Markov chain is considered mixed for any iV for 
which the independence sampling model is selected over 
the Markov model. We find that even for ./V that pro- 
vide converged distributions of the graphical properties 
above, a small fraction of the edges may still be signif- 
icantly correlated (as determined by the independence 
test). The robustness of the graphical properties indi- 
cate that if fully independent graphical realizations are 
desired, the laborious process of testing each edge may 
be unavoidable. 

Finally, we repeated our tests by starting parallel 
Markov chains from widely dispersed starting points and 
computing the Gelman-Rubin diagnostic for them. We 
find that the parallel chains converge to the same sta- 
tionary distribution. This check was performed to ensure 
that our distributions of graphical properties was not be- 
ing driven by the starting point of the Markov chain. 

Apart from the theoretical results summarized above, 
we also provide an open-source CH — h implementation of 
the non-parametric test, along with sample problems to 
illustrate its use [22]. 
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Appendix A 

Here, we address how one may thin a long, A-step 
time-series so that the thinned chain resembles a first- 
order Markov process. It follows the arguments in 
Sec. IV A. We consider a binary time-series and popu- 
late a 2 x 2 x 2 contingency table of transitions that one 
might observe in a sequence of length three. The con- 
tingency table is fitted with log-linear models for second- 
and first-order Markov models (see Sec. 7.3.1 in [4] for 
second-order Markov models), and the G 2 statistic calcu- 
lated along with the difference in the BICs arising from 



second-order and first-order Markov model fits. 

Initially, the high-order Markov model fits to data bet- 
ter than the first-order model. However, as the A-step 
time-series is thinned, and the correlation decays, the fit 
of first-order model, vis-a-vis, the second-order model im- 
proves. At a particular thinning factor k' the first-order 
model fits better than the second order one (the differ- 
ence in BICs favors the first-order model) and we obtain 
a first-order Markov chain. 

In the software that we have released with this pa- 
per [22], the C function mctestO checks the fit of a 
time-series to first- and second-order Markov chains, re- 
turning the ratio of likelihoods and the difference in the 
BICs of the two models. The C function indtestO 
does the same, but between a first-order Markov model 
and independent draws from a binary distribution. Both 
the functions construct the contigency table from the bi- 
nary time-series supplied via the function arguments, and 
compute the expected values of the table entries (using 
log-linear models calibrated to table data). In contrast, 
the function mcestO estimates the transition probabili- 
ties a and /3 from a binary time-series, being modeled as 
a first-order Markov process. Examples of the use of the 
tests (mctest and indtest) are illustrated in the soft- 
ware package associated with this paper (in exOl/). The 
use of all the functions, in a MC on graphs, is in ex02/. 
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